Search CORE

Learning Embeddings to lexicalise RDF Properties

Author: Gardent Claire
Perez-Beltrachini Laura
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2016
Field of study

International audienceA difficult task when generating text from knowledge bases (KB) consists in finding appropriate lexicalisations for KB symbols. We present an approach for lexicalis-ing knowledge base relations and apply it to DBPedia data. Our model learns low-dimensional embeddings of words and RDF resources and uses these representations to score RDF properties against candidate lexicalisations. Training our model using (i) pairs of RDF triples and automatically generated verbalisations of these triples and (ii) pairs of paraphrases extracted from various resources, yields competitive results on DBPedia data

A Statistical, Grammar-Based Approach to Microplanning

Author: Gardent Claire
Perez-beltrachini Laura
Publication venue: 'MIT Press - Journals'
Publication date: 01/12/2016
Field of study

International audienceWhile there has been much work in recent years on data-driven natural language generation, little attention has been paid to the fine grained interactions that arise during micro-planning between aggregation, surface realization and sentence segmentation. In this paper, we propose a hybrid symbolic/statistical approach to jointly model these interactions. Our approach integrates a small handwritten grammar, a statistical hypertagger and a surface realization algorithm. It is applied to the verbalization of knowledge base queries and tested on 13 knowledge bases to demonstrate domain independence. We evaluate our approach in several ways. A quantitative analysis shows that the hybrid approach outperforms a purely symbolic approach in terms of both speed and coverage. Results from a human study indicate that users find the output of this hybrid statistic/symbolic system more fluent than both a template-and a purely symbolic grammar-based approach. Finally, we illustrate by means of examples that our approach can account for various factors impacting aggregation, sentence segmentation and surface realization

Directory of Open Access Journals

Creating Training Corpora for NLG Micro-Planning

Author: Gardent Claire
Narayan Shashi
Perez-Beltrachini Laura
Shimorina Anastasia
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2017
Field of study

International audienceIn this paper, we focus on how to create data-to-text corpora which can support the learning of wide-coverage micro-planners i.e., generation systems that handle lexicalisation, aggregation, surface re-alisation, sentence segmentation and referring expression generation. We start by reviewing common practice in designing training benchmarks for Natural Language Generation. We then present a novel framework for semi-automatically creating linguistically challenging NLG corpora from existing Knowledge Bases. We apply our framework to DBpedia data and compare the resulting dataset with (Wen et al., 2016)'s dataset. We show that while (Wen et al., 2016)'s dataset is more than twice larger than ours, it is less diverse both in terms of input and in terms of text. We thus propose our corpus generation framework as a novel method for creating challenging data sets from which NLG models can be learned which are capable of generating text from KB data

arXiv.org e-Print Archive

Deep Graph Convolutional Encoders for Structured Data to Text Generation

Author: Marcheggiani Diego
Perez-Beltrachini Laura
Publication venue
Publication date: 01/01/2018
Field of study

Most previous work on neural text generation from graph-structured data relies on standard sequence-to-sequence methods. These approaches linearise the input graph to be fed to a recurrent neural network. In this paper, we propose an alternative encoder based on graph convolutional networks that directly exploits the input structure. We report results on two graph-to-sequence datasets that empirically show the benefits of explicitly encoding the input graph structure.Comment: INLG 201

arXiv.org e-Print Archive

Bootstrapping Generators from Noisy Data

Author: Lapata Maria
Perez-Beltrachini Laura
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2018
Field of study

A core step in statistical data-to-text generation concerns learning correspondences between structured data representations (e.g., facts in a database) and associated texts. In this paper we aim to bootstrap generators from large scale datasets where the data (e.g., DBPedia facts) and related texts (e.g., Wikipedia abstracts) are loosely aligned. We tackle this challenging task by introducing a special-purpose content selection mechanism. We use multi-instance learning to automatically discover correspondences between data and text pairs and show how these can be used to enhance the content signal while training an encoder-decoder architecture. Experimental results demonstrate that models trained with content-specific objectives improve upon a vanilla encoder-decoder which solely relies on soft attention.Comment: NAACL 201

Using FB-LTAG Derivation Trees to Generate Transformation-Based Grammar Exercices

Author: Gardent Claire
Perez-Beltrachini Laura
Publication venue: HAL CCSD
Publication date: 26/09/2012
Field of study

International audienceUsing a Feature-Based Lexicalised Tree Adjoining Grammar (FB-LTAG), we present an approach for generating pairs of sentences that are related by a syntactic transformation and we apply this approach to create language learning exercises. We argue that the derivation trees of an FB-LTAG provide a good level of representation for capturing syntactic transformations. We relate our approach to previous work on sentence reformulation, question generation and grammar exercise generation. We evaluate precision and linguistic coverage. And we demonstrate the genericity of the proposal by applying it to a range of transformations including the Passive/Active transformation, the pronominalisation of an NP, the assertion / yes-no question relation and the assertion / wh-question transformation

Generating Grammar Exercises

Author: Gardent Claire
Kruszewski Germán
Perez-Beltrachini Laura
Publication venue
Publication date: 01/01/2012
Field of study

International audienceGrammar exercises for language learning fall into two distinct classes: those that are based on ''real life sentences'' extracted from existing documents or from the web; and those that seek to facilitate language acquisition by presenting the learner with exercises whose syntax is as simple as possible and whose vocabulary is restricted to that contained in the textbook being used. In this paper, we introduce a framework (called gramex) which permits generating the second type of grammar exercises. Using generation techniques, we show that a grammar can be used to semi-automatically generate grammar exercises which target a specific learning goal; are made of short, simple sentences; and whose vocabulary is restricted to that used in a given textbook

Using Regular Tree Grammars to enhance Sentence Realisation

Author: Gardent Claire
Gottesman Benjamin
Perez-Beltrachini Laura
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2011
Field of study

International audienceFeature-based regular tree grammars (FRTG) can be used to generate the derivation trees of a feature-based tree adjoining grammar (FTAG). We make use of this fact to specify and implement both an FTAG-based sentence realiser and a benchmark generator for this realiser. We argue furthermore that the FRTG encoding enables us to improve on other proposals based on a grammar of TAG derivation trees in several ways. It preserves the compositional semantics that can be encoded in feature-based TAGs; it increases efﬁciency and restricts overgeneration; and it provides a uniform resource for generation, benchmark construction, and parsing

Building RDF Content for Data-to-Text Generation

Author: Gardent Claire
Perez-Beltrachini Laura
Sayed Rania
Publication venue
Publication date: 13/12/2016
Field of study

International audienceIn Natural Language Generation (NLG), one important limitation is the lack of common benchmarks on which to train, evaluate and compare data-to-text generators. In this paper, we make one step in that direction and introduce a method for automatically creating an arbitrary large repertoire of data units that could serve as input for generation. Using both automated metrics and a human evaluation, we show that the data units produced by our method are both diverse and coherent